Lexical Generalisation for Word-level Matching in Plagiarism Detection
نویسندگان
چکیده
Plagiarism has always been a concern in many sectors, particularly in education. With the sharp rise in the number of electronic resources available online, an increasing number of plagiarism cases has been observed in recent years. As the amount of source materials is vast, the use of plagiarism detection tools has become the norm to aid the investigation of possible plagiarism cases. This paper describes an approach to improve plagiarism detection by incorporating a lexical generalisation technique. The goal is to identify plagiarised texts even if they are paraphrased using different words. Experiments performed on a subset of the PAN‟10 corpus show that the matching approach involving lexical generalisation yields promising results, as compared to standard n-gram matching
منابع مشابه
Methods for Detecting Paraphrase Plagiarism
Paraphrase plagiarism is one of the difficult challenges facing plagiarism detection systems. Paraphrasing occur when texts are lexically or syntactically altered to look different, but retain their original meaning. Most plagiarism detection systems (many of which are commercial based) are designed to detect word co-occurrences and light modifications, but are unable to detect severe semantic ...
متن کاملThe Use of Machine Semantic Analysis in Plagiarism Detection
Plagiarism detection systems are known for years in the university community. However, most of the existing detectors for the natural language texts use rather simple comparison methods that make the instances of plagiarism easy to hide. The software, designed for plagiarism detection in computer programs, utilizes far more advanced techniques. We propose a method, which adds functionalities si...
متن کاملFuzzy-Semantic Similarity for Automatic Multilingual Plagiarism Detection
A word may have multiple meanings or senses, it could be modeled by considering that words in a sentence have a fuzzy set that contains words with similar meaning, which make detecting plagiarism a hard task especially when dealing with semantic meaning, and even harder for cross language plagiarism detection. Arabic is known by its richness, word’s constructions and meanings diversity, hence c...
متن کاملJADE based Virtual Checker to Avoid Plagiarism in MOOC's
The MOOC’s (Massive open online classes) have ushered in a new era of learning overcoming the boundaries of time and geography to provide high quality education to masses who cannot afford university education. The major drawback however is that although they have best tried to capture classroom environment but facets such as teacher-student interaction and the usefulness of assignments as a so...
متن کاملSemantic Sequence Kin: A Method of Document Copy Detection
The string matching and global word frequency model are two basic models of Document Copy Detection, although they are both unsatisfied in some respects. The String Kernel (SK) and Word Sequence Kernel (WSK) may map string pairs into a new feature space directly, in which the data is linearly separable. This idea inspires us with the Semantic Sequence Kin (SSK) and we apply it to document copy ...
متن کامل